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Abstract 


In metabolomics, in addition to determining the statistical significance of individual metabolites, it is 
important to determine the statistical significance of metabolic pathways or other groupings of 
metabolites. There are two classes of tests for determining the statistical significance of metabolic 
pathways—competitive tests and self-contained tests. Competitive tests assess whether the pathway of 
interest is “more changed” than the remaining pathways, while self-contained tests assess the statistical 
significance of the pathway regardless of the effects in the other pathways. The latter tests are typically 
more useful for assessing the statistical significance of metabolic pathways. Fisher's meta-analysis 
statistic which combines p-values is an example of a self-contained test. However, the distribution of the 
statistic does not have a closed form when the p-values have non-zero correlations. An empirical p-value 
may be derived by randomly shuffling the group labels a large number of times in order to create an 
empirical null distribution. Unfortunately, this process can be very time-consuming, especially for more 
complex statistical designs. Additionally, the smallest empirical p-value is limited by the number of 
permutations, which can affect methods that adjust for multiple comparisons, and this also makes 
ranking the pathways less precise. To eliminate these issues, an approximation of this distribution was 
applied. Simulation studies showed comparable performance to the permutation-derived test except for 
cases where the correlations were very low and sample sizes were high, where the results would be 
more conservative. However, for applications to real data, such scenarios are unlikely because 
metabolites that are related typically have at least modest correlations. This was seen in the application 
to a human metabolomics data set where the results for the approximation were very close to those 
derived from the permutation-derived p-values. 


Introduction 


In omics sciences, one can assess the statistical significance of individual variables, but of interest is the 
statistical significance of the biological pathways. In particular, the focus of this paper is on 
metabolomics, and a “pathway” will refer to a physical pathway or any biosignature or combination of 
metabolites. A common test for pathways is to test for enrichment or “overrepresentation” such as the 
standard probabilities determined from the hypergeometric distribution (Rivals |, 2007). This type of test 
is referred to as a “competitive” test (Evangelou M, 2012) since this test compares whether the pathway 
of interest is “more changed” than the remaining pathways. However, of more interest for 
metabolomics studies are “self-contained” tests (Fridley BL, 2010), which test whether the pathway has 
changed, regardless of the changes in other pathways. More formally, this tests whether the mean 


vector is different between the groups of interest. In (Mitchell, 2015), it was shown that tests that 
combine p-values performed at least as well as multivariate statistics. In particular, Fisher's statistic 
(Fisher, 1932) with p-values determined from a permutation distribution performed well and was the 
recommended method. More formally, for a pathway consisting of m metabolites, let pı, p2, ..., Pm 
represent the p-values for the statistical test for each of the individual metabolites. Let X; = -2*log(pi). 
Let X = Di” pi represent the sum, which is Fisher's statistic used for meta analysis (Fisher, 1932). If the 
{Xi} are independent, then X has a chi-squared distribution with 2m degrees of freedom. However, if the 
{Xi} are not independent, the distribution of X does not have a closed form, thus, in (Mitchell, 2015) a 
permutation distribution to determine the p-values was performed as follows: for the test of two 
independent groups, for example, A vs. B, randomly shuffle the group labels, compute X and repeat a 
large number of times. Then the empirical p-value is the number of X's that are least as large as the 
observed value divided by the total number of permutations. 


Although this works well, this has some limitations: 

(1) The runtime can be very long. 

(2) The smallest p-value is limited by the number of simulation runs. For example, if there are 10,000 
permutations and the test statistic is greater than 10,000 values of the permuted values, then the p- 
value is “<0.0001”. More precision can be useful if Bonferroni adjustments are desired or for ranking the 
pathways. 

(3) For each type of statistical test, one has to change the code to adapt it to each statistic of interest, 
and some of these analyses can take much longer than simpler ones. For example, a two-way repeated- 
measures ANOVA with contrasts can have a much longer runtime than a two-sample t-test, thus, it 
would be very useful to have an approximation that one could use instead. 


Proposed Approximation and Simulation Studies 


Ferrari, et al. propose an approximation under the following conditions (Ferrari, 2019): if the {X;} are not 
2 Lic; Pij 
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independent, then X has an approximate gamma distribution, T (=, u ) whereu = 2(1 + 


where p; is the correlation between X; and X;. Thus, we need to determine (pi) before applying this 
formula, i.e., we need to determine the correlation of log(p;) and log(p;). We simulated various null 
scenarios for the pooled two-sample t-test and Welch’s two-sample t-test: for two variables/tests. We 
used values of (-0.9, -0.75, -0.5, -0.25, 0, 0.25, 0.5, 0.75, 0.9) for the correlations between the two 
variables for two groups of sizes (3, 5, 10, 20, 30, 50). Each of these combinations was tested for these 
four statistical tests: (1) pooled two-sample t-test with population standard deviations = 0.4 for each 
group, (2) Welch’s two-sample t-test (allows for unequal variances) for population standard deviations 
of 0.4 each, (3) Welch’s two-sample t-test with population standard deviations of sqrt(0.125) and 0.5 
(the ratio of the variances = 2), and (4) Welch’s two-sample t-test with population standard deviations of 
0.3 and 0.6. All had means = 0, and for all 4 models, Gaussian (normal) random variables were 
simulated. For each model and parameter combination, 10,000 simulations were performed. The results 
are shown in Table 1. From this table, one can see that a good approximation of the correlation of log(p;) 
and log(p;) is the square of the correlation of the original variables themselves, especially when then are 
at least 5 observations for each group. Henceforth, we refer to this statistic which uses Ferrari’s gamma 
approximation and the aforementioned approximation of the correlation of the log p-values as the 
“Fisher-Ferrari test.” 


Next, the Type | error was assessed for various combinations. All computations and simulations were 
performed with R version 4.2.2 (R Core Team, 2023). Let n = nı = n2 be the number of observations per 


group, and let m represent the total number of metabolites in the pathway. Let CS(p) represent the m 
by m compound symmetric correlation matrix with correlation p, i.e., the correlation of each pair 
metabolites is the same and equal to p. For each combination of parameters, the Type | error was 
determined from 5,000 simulation runs. Table 2 gives the Type | error for various combinations of (n, m, 
p). From Table 2, we see that most are close to the nominal level of 0.05, but for those with correlations 
= 0, the Type | error is under the nominal level, especially for the larger n and m, so in those cases the 
test is conservative. However, in practice, metabolites are grouped together in a pathway or similar 
classification because they are related to each other and should correlate with each other, so this case 
would be uncommon in real data sets. 


Next, a comparison of power was made to those obtained by using the permutation distribution in 
(Mitchell, 2015). In particular, the power is compared for all the parameter combinations listed in 

S Table1 and S_Table2 of that publication (column “FP” is the power of the Fisher test with a 
permutation distribution). The values are listed in Table 3 where “FF” is the estimated power using the 
Fisher-Ferrari test. Here we can see the values are very similar. For these computations, 1,000 
simulations runs were performed. The comparison of all of these is shown in Figure 1. The line y=x is for 
reference against the Fisher-permutation, so points below the line have lower power for FF compared to 
FP. From Figure 1 we see the values are close for most combinations. Those combinations with CS(0) are 
indicated in blue and account for most of those with lower power than FP, which is expected given the 
results of the Type | error. The Spearman correlation between the two methods = 0.98. 


Application to a Human Metabolomics Data Set 


The above results are in reference to simulation studies, but real data are typically more complex. To 
compare these using real data, the insulin resistance data set from (Mitchell, 2015) was used. The p- 
values for both tests are shown in Table 4, and the results are nearly identical. Since the FP p-value were 
determined from 10,000 permutations, the lowest p-values for that method were “<0.0001” in that 
publication. There would also be expected to be some variation due to the permutation itself (i.e., a 
second run of 10,000 permutations would give slightly different results). The Fisher-Ferrari test is able to 
provide more precise p-values here, which is helpful in ranking the pathways and can affect the multiple 
comparison corrections such as Bonferroni adjustments. 


Discussion 


Fisher's statistic which can be used to combine p-values is a very useful test for assessing the statistical 
significance of a metabolic pathway or other grouping of metabolites. This statistic is based on the sum 
of the natural logarithms of the p-values. Because the log p-values are correlated, the distribution of the 
statistic does not have a closed form. The p-values can be derived empirically by permuting the group 
values, but this can be very time-consuming, and the smallest p-value will be limited by the number of 
permutations. To address these issues, a gamma approximation (Ferrari, 2019) to this statistic can be 
used with the correlation of the log p-values approximated by the square of the p-values of the 
metabolites. From the simulation studies, this test, which we refer to as the Fisher-Ferrari test, has 
similar Type | error and power to the Fisher statistic with the permutation-derived null distribution 
except for some cases with low correlations, where the Fisher-Ferrari test is more conservative. 
However, in real data these cases would be expected to be uncommon since metabolites in the same 
pathway would be expected to have at least modest correlations. This was seen in the application to the 


insulin resistance human metabolomics study (Mitchell, 2015) where the results between the two 
methods agreed very well, but the Fisher-Ferrari test was able to provide p-values for the cases of 
“<0.0001” from the other method. 
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Tables and Figures 


Table 1: Correlation of the log p-values vs. the correlation of the original variables for 4 scenarios (COR1, 
COR2, COR3, COR4), N is the sample size per group, RHO is the correlation among the original variables, 
and RHO SQ is the square of this 


N RHO RHO SQ COR1 COR2 COR3 COR4 


3 -0.9 0.81 0.72 0.71 0.70 0.68 
3 -0.75 0.5625 0.46 0.46 0.44 0.43 
3 -0.5 0.25 0.19 0.15 0.19 0.18 
3 -0.25 0.0625 0.05 0.06 0.03 0.06 
3 0 0 0.02 -0.03 0.01 -0.01 
3 0.25 0.0625 0.06 0.03 0.04 0.04 
3 0.5 0.25 0.18 0.18 0.18 0.19 
3 0.75 0.5625 0.43 0.43 0.43 0.43 
3 0.9 0.81 0.72 0.70 0.70 0.69 
5 -0.9 0.81 0.77 0.77 0.76 0.76 
5 -0.75 0.5625 0.50 0.50 0.49 0.49 
5 -0.5 0.25 0.22 0.22 0.23 0.20 
5 -0.25 0.0625 0.07 0.05 0.05 0.04 
5 0 0 0.01 0.00 0.00 -0.01 
5 0.25 0.0625 0.04 0.05 0.05 0.03 
5 0.5 0.25 0.21 0.23 0.19 0.21 
5 0.75 0.5625 0.48 0.50 0.48 0.48 
5 0.9 0.81 0.78 0.76 0.76 0.75 
10 -0.9 0.81 0.79 0.79 0.79 0.79 
10 -0.75 0.5625 0.52 0.52 0.52 0.53 
10 -0.5 0.25 0.26 0.22 0.23 0.22 
10 -0.25 0.0625 0.05 0.07 0.04 0.04 
10 0 0 0.01 0.01 -0.02 0.00 
10 0.25 0.0625 0.07 0.07 0.06 0.05 
10 0.5 0.25 0.25 0.22 0.23 0.21 
10 0.75 0.5625 0.54 0.54 0.53 0.53 
10 0.9 0.81 0.79 0.79 0.79 0.79 
20 -0.9 0.81 0.80 0.80 0.79 0.79 
20 -0.75 0.5625 0.53 0.53 0.54 0.54 
20 -0.5 0.25 0.25 0.24 0.20 0.24 
20 -0.25 0.0625 0.07 0.05 0.05 0.06 
20 0 0 0.02 -0.02 0.01 0.00 
20 0.25 0.0625 0.05 0.07 0.06 0.06 
20 0.5 0.25 0.24 0.26 0.25 0.20 
20 0.75 0.5625 0.55 0.55 0.54 0.55 
20 0.9 0.81 0.80 0.79 0.80 0.79 


30 -0.9 0.81 0.80 0.81 0.79 0.80 
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0.55 
0.26 
0.04 
0.02 
0.05 
0.26 
0.55 
0.80 
0.81 
0.56 
0.23 
0.06 
-0.01 
0.04 
0.23 
0.55 
0.80 


0.55 
0.24 
0.06 
0.00 
0.06 
0.23 
0.55 
0.80 
0.80 
0.55 
0.23 
0.06 
-0.01 
0.04 
0.21 
0.54 
0.80 


0.55 
0.21 
0.05 
0.01 
0.06 
0.21 
0.55 
0.78 
0.80 
0.54 
0.22 
0.06 
-0.01 
0.05 
0.23 
0.56 
0.80 


Table 2: Type | Error of the Fisher-Ferrari test for various sample sizes (n), pathway sizes (m), and 
correlations (RHO) 


n m RHO Type! Error 


5 2 0 0.037 
5 2 0.5 0.0428 
5 2 0.75 0.0462 
5 2 0.9 0.0448 
5 3 0 0.0286 
5 3 0.5 0.0396 
5 3 0.75 0.0434 
5 3 0.9 0.0424 
5 5 0 0.0226 
5 5 0.5 0.036 
5 5 0.75 0.0426 
5 5 0.9 0.0422 
5 10 0 0.0074 
5 10 0.5 0.0344 
5 10 0.75 0.0412 
5 10 0.9 0.0436 
10 2 0 0.0416 
10 2 0.5 0.0478 
10 2 0.75 0.0494 
10 2 0.9 0.0508 
10 3 0 0.0418 
10 3 0.5 0.0438 
10 3 0.75 0.0516 
10 3 0.9 0.0512 
10 5 0 0.038 
10 5 0.5 0.047 
10 5 0.75 0.0508 
10 5 0.9 0.049 
10 10 0 0.0218 
10 10 0.5 0.0474 
10 10 0.75 0.0524 
10 10 0.9 0.0508 
25 2 0 0.0494 
25 2 0.5 0.0506 
25 2 0.75 0.0514 
25 2 0.9 0.0492 
25 3 0 0.0484 
25 3 0.5 0.0488 
25 3 0.75 0.0482 
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0.0526 
0.0448 
0.05 
0.0524 
0.0522 
0.0358 
0.055 
0.049 
0.0496 
0.047 
0.0474 
0.0522 
0.0492 
0.0426 
0.0544 
0.0516 
0.0518 
0.045 
0.0482 
0.0542 
0.0494 
0.0456 
0.0552 
0.0506 
0.0516 


Table 3: Comparison of Power of Fisher's test with a permutation-derived p-values (FP) vs. the Fisher- 
Ferrari test (FF) 


rho is the pairwise correlation in the CS(rho) covariance matrix, n is the number of samples per group, m 
is the number of variables in the pathway 


MU refers to the mean difference between the two simulated groups: m11 = c(0.15,0.15,0.15,0.15), 
m12 = (0.3,0.3,0.3,0.3), m13 = (0.3,0,0,0), m21 = (0.15,0.15,0.15,0.15,0.15,0.15,0.15,0.15), m22 = 
(0.3,0.3,0.3,0.3), m23 = (0.3,0,0,0,0,0,0,0) 


SIGMA is the vector of standard deviations for each variable (assumed to be the same for both groups) 
where s11 = (0.3, 0.3, 0.3, 0.3), s12 = (0.15, 0.25, 0.35, 0.45), s21 = (0.3,0.3,0.3,0.3,0.3,0.3,0.3,0.3), s22 = 
(0.15, 0.25, 0.35, 0.45, 0.15, 0.25, 0.35, 0.45) 


MU SIGMA rho n m FF FP 

m11 s11 0.9 5 4 0.101 0.118 
m11 s12 0.9 5 4 0.125 0.134 
m11 s11 0.7 5 4 0.102 0.121 
m11 s12 0.7 5 4 0.139 0.159 
m11 s11 0.5 5 4 0.125 0.159 
m11 s12 0.5 5 4 0.145 0.213 
m11 s11 0 5 4 0.104 0.203 
m11 s12 0 5 4 0.169 0.271 
m12 s11 0.9 5 4 0.294 0.323 
m12 s12 0.9 5 4 0.384 0.413 
m12 s11 0.7 5 4 0.322 0.358 
m12 s12 0.7 5 4 0.466 0.497 
m12 s11 0.5 5 4 0.379 0.403 
m12 s12 0.5 5 4 0.537 0.574 
m12 s11 0 5 4 0.53 0.605 
m12 s12 0 5 4 0.701 0.82 
m13 s11 0.9 5 4 0.061 0.101 
m13 s11 0.7 5 4 0.091 0.084 
m13 s11 0.5 5 4 0.061 0.118 
m13 s11 0 5 4 0.089 0.159 
m11 s11 0.9 10 4 0.217 0.211 
m11 s12 0.9 10 4 0.247 0.255 
m11 s11 0.7 10 4 0.207 0.226 
m11 s12 0.7 10 4 0.322 0.311 
m11 s11 0.5 10 4 0.257 0.28 
m11 s12 0.5 10 4 0.379 0.37 
m11 s11 0 10 4 0.337 0.379 
m11 s12 0 10 4 0.484 0.552 
m12 s11 0.9 10 4 0.595 0.598 
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Table 4: Comparison of p-values from the Fisher-Ferrari test(FF) vs. the Fisher test with the permutation 
derived p-value (FP) sorted by the Fisher-Ferrari p-values 


PATHWAY m FP FF 

Methionine, Cysteine, SAM and Taurine Metabolism 5 <0.0001 <1e-16 
Benzoate Metabolism 3 <0.0001 1.13E-07 
Leucine, Isoleucine and Valine Metabolism 13 <0.0001 1.48E-07 
Glycine, Serine and Threonine Metabolism 5 <0.0001 3.91E-07 
Lysolipid 24 1.00E-04 4.46E-06 
Creatine Metabolism 2 1.00E-04 5.20E-06 
Purine Metabolism, (Hypo)Xanthine/Inosine containing 3 <0.0001 1.21E-05 
Nicotinate and Nicotinamide Metabolism 2 <0.0001 1.42E-05 
Fructose, Mannose and Galactose Metabolism 3 <0.0001 2.41E-05 
Glycolysis, Gluconeogenesis, and Pyruvate Metabolism 5 1.00E-04 8.87E-05 
Phenylalanine and Tyrosine Metabolism 8 3.00E-04 0.0002 
TCA Cycle 4 0.0019 0.0016 
Fatty Acid Metabolism (also BCAA Metabolism) 2 0.0021 0.0016 
Long Chain Fatty Acid 11 0.0048 0.0022 
Fatty Acid Metabolism(Acyl Carnitine) 8 0.0048 0.0027 
Polyunsaturated Fatty Acid (n3 and n6) 10 0.0064 0.0029 
Polypeptide 3 0.0039 0.0033 
Phospholipid Metabolism 2 0.0062 0.0051 
Tryptophan Metabolism 8 0.0078 0.0055 
Medium Chain Fatty Acid 7 0.0199 0.0152 
Steroid 14 0.0344 0.0273 
Gamma-glutamyl Amino Acid 7 0.0411 0.0371 
Glycerolipid Metabolism 2 0.0449 0.0432 
Purine Metabolism, Adenine containing 2 0.0475 0.0445 
Food Component/Plant 6 0.0459 0.0476 
Hemoglobin and Porphyrin Metabolism 5 0.0504 0.0517 
Monoacylglycerol 2 0.0511 0.0521 
Urea cycle; Arginine and Proline Metabolism 9 0.0604 0.0596 
Fatty Acid, Monohydroxy 2 0.0857 0.0845 
Xanthine Metabolism 4 0.1835 0.2004 
Glutamate Metabolism 3 0.2697 0.2778 
Carnitine Metabolism 2 0.3364 0.3421 
Secondary Bile Acid Metabolism 6 0.3595 0.3683 
Sterol 3 0.3927 0.3910 
Pyrimidine Metabolism, Uracil containing 2 0.4858 0.4766 
Lysine Metabolism 4 0.5114 0.5088 
Alanine and Aspartate Metabolism 4 0.7924 0.7757 
Primary Bile Acid Metabolism 3 0.8175 0.8027 
Fatty Acid, Dicarboxylate 2 0.8006 0.8078 


Figure 1: Comparison of power of Fisher-Ferrari test (FF) vs. the Fisher test with the permutation- 
derived null distribution 


The red line is the line x=y for the FP power (FP), so points above the higher line are where FF has 


greater power than FP and points below are values where FF has lower power than FP. 
The blue points are those with rho=0 as a parameter. 
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